Skip to main content

Reply to this post | Go Back
View Post [edit]

Poster: goharris Date: Dec 21, 2011 12:38pm
Forum: web Subject: We were unable to get the robots.txt document to display this page.

There was a thread on this problem in October 2011 in this forum. I am another person relying on Internet Archive for research (we should all donate money) - and suddenly, Tuesday morning (PST)December 20, I got this message (We were unable to get the robots.txt document to display this page.) for www.contactpoint.ca and www.counselling.net. Both sites are active. I don't think either of them had a robots.txt file before. Did something happen at archive.org? Any suggestions? Any fixes? Thanks.

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Dec 21, 2011 1:22pm
Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

the problem is with the servers where the robots.txt live not with the wayback:

http://www.contactpoint.ca/robots.txt results in:
Not Found
The requested URL /robots.txt was not found on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

http://www.counselling.net/robots.txt result in:
Not Found
The requested URL /jnew/robots.txt was not found on this server.
Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

Reply to this post
Reply [edit]

Poster: MrZer0 Date: Feb 29, 2012 9:16am
Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Hey Jeff,

is there an way to find out on which server the robots.txt file is hosted?
Maybe it would be usefull to check if the server is online.

Reply to this post
Reply [edit]

Poster: Jeff Kaplan Date: Feb 29, 2012 10:13am
Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

We aren't able to do that.

Reply to this post
Reply [edit]

Poster: goharris Date: Dec 22, 2011 3:23pm
Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Hello Jeff

I've reread the FAQ about robots.txt - and my understanding is that site owners put up a robots.txt file to BLOCK the Internet Archive (Alexa) from crawling. If we put up a plain robots.txt file, will the Internet Archive automatically make the archives available and resume crawling? And does that happen immediately?

I say we - because I'm working on research for the organization that owns these two sites.

The other response from Jory2 remarked on the copyright. "All rights reserved" is a fairly standard copyright statement. Would all this be solved if the parent organization sent an email (or something) to Internet Archive granting permission?

Thanking you in advance.

Reply to this post
Reply [edit]

Poster: jory2 Date: Dec 21, 2011 2:17pm
Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

www.contactpoint.ca and www.counselling.net as you noted are both active.
The website counselling.net has on it's page a notice of Copyright for The Counselling Foundation of Canada 2011.
As well as "All rights reserved."
Contact them: info@counselling.net.
The website archive.org has no legal rights to give you access to someone's intellectual property without express consent from the rightful owners.
Same can be said for http://www.contactpoint.ca
Tel: (416) 929-2510 ext.134
Fax: (416) 923-2536
E-mail: contactpoint@ceric.ca

Reply to this post
Reply [edit]

Poster: goharris Date: Dec 21, 2011 4:56pm
Forum: web Subject: Re: We were unable to get the robots.txt document to display this page.

Thank you for the two responses. I'll talk to the organization about adding the robots.txt files. Guess this is something all webmasters need to be aware of.